Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

نویسندگان

  • Yi Ma
  • Ritu Singh
  • Eric Fosler-Lussier
  • Robert Lofthus
چکیده

Early primary children’s literature poses some interesting challenges for automated readability assessment: for example, teachers often use fine-grained reading leveling systems for determining appropriate books for children to read (many current systems approach readability assessment at a coarser whole grade level). In previous work (Ma et al., 2012), we suggested that the fine-grained assessment task can be approached using a ranking methodology, and incorporating features that correspond to the visual layout of the page improves performance. However, the previous methodology for using “found” text (e.g., scanning in a book from the library) requires human annotation of the text regions and correction of the OCR text. In this work, we ask whether the annotation process can be automated, and also experiment with richer syntactic features found in the literature that can be automatically derived from either the humancorrected or raw OCR text. We find that automated visual and text feature extraction work reasonably well and can allow for scaling to larger datasets, but that in our particular experiments the use of syntactic features adds little to the performance of the system, contrary to previous findings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition

Fine-grained recognition is challenging due to the subtle local inter-class differences versus the large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introd...

متن کامل

Automatic Construction of Large Readability Corpora

This work presents a framework for the automatic construction of large Web corpora classified by readability level. We compare different Machine Learning classifiers for the task of readability assessment focusing on Portuguese and English texts, analysing the impact of variables like the feature inventory used in the resulting corpus. In a comparison between shallow and deeper features, the fo...

متن کامل

Fine-Grained Certainty Level Annotations Used for Coarser-Grained E-Health Scenarios - Certainty Classification of Diagnostic Statements in Swedish Clinical Text

An important task in information access methods is distinguishing factual information from speculative or negated information. Fine-grained certainty levels of diagnostic statements in Swedish clinical text are annotated in a corpus from a medical university hospital. The annotation model has two polarities (positive and negative) and three certainty levels. However, there are many e-health sce...

متن کامل

Inferring Semantics from Collocation Clusters to Represent Verbs and Nouns

Current lexical semantic theories provide representations at a coarse grained level. In this paper, I will provide motivations for a fine grained representation for verbs and. nouns. An initial case study is done to serve as evidence that a more detailed representation is needed for tasks that require high accuracy rates, such as machine translation. An automatic approach to gather fine grained...

متن کامل

Fine-grained wound tissue analysis using deep neural network

Tissue assessment for chronic wounds is the basis of wound grading and selection of treatment approaches. While several image processing approaches have been proposed for automatic wound tissue analysis, there has been a shortcoming in these approaches for clinical practices. In particular, seemingly, all previous approaches have assumed only 3 tissue types in the chronic wounds, while these wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012